A Nonblocking Coordinated Checkpointing Algorithm for Mobile Computing Systems
نویسندگان
چکیده
A checkpoint algorithm for mobile computing systems needs to handle many new issues like: mobility, low bandwidth of wireless channels, lack of stable storage on mobile nodes, disconnections, limited battery power and high failure rate of mobile nodes. These issues make traditional checkpointing techniques unsuitable for such environments. Minimum-process coordinated checkpointing is an attractive approach to introduce fault tolerance in mobile distributed systems transparently. This approach is domino-free, requires at most two checkpoints of a process on stable storage, and forces only a minimum number of processes to checkpoint. But, it requires extra synchronization messages, blocking of the underlying computation or taking some useless checkpoints. In this paper, we propose a nonblocking coordinated checkpointing algorithm for mobile computing systems, which requires only a minimum number of processes to take permanent checkpoints. We reduce the message complexity as compared to the Cao-Singhal algorithm [4], while keeping the number of useless checkpoints unchanged. We also address the related issues like: failures during checkpointing, disconnections, concurrent initiations of the algorithm and maintaining exact dependencies among processes. Finally, the paper presents an optimization technique, which significantly reduces the number of useless checkpoints at the cost of minor increase in the message complexity. In coordinated checkpointing, if a single process fails to take its tentative checkpoint; all the checkpoint effort is aborted. We try to reduce this effort by taking soft checkpoints in the first phase at Mobile Hosts.
منابع مشابه
An Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment
Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...
متن کاملEfficient Checkpoint-based Failure Recovery Techniques in Mobile Computing Systems
Conventional distributed and domino effect-free failure recovery techniques are inappropriate for mobile computing systems because each mobile host is forced to take a new checkpoint (based on coordinated checkpointing). Otherwise, multiple local checkpoints may need to be stored in stable storage (based on communication-induced checkpointing). Hence, this investigation presents a novel domino ...
متن کاملAn Efficient Time-Based Checkpointing Protocol for Mobile Computing Systems over Mobile IP
Time-based coordinated checkpointing protocols are well suited for mobile computing systems because no explicit coordination message is needed while the advantages of coordinated checkpointing are kept. However, without coordination, every process has to take a checkpoint during a checkpointing process. In this paper, an efficient time-based coordinated checkpointing protocol for mobile computi...
متن کاملMinimum Process Coordinated Checkpointing Scheme for Ad Hoc Networks
The wireless mobile ad hoc network (MANET) architecture is one consisting of a set of mobile hosts capable of communicating with each other without the assistance of base stations. This has made possible creating a mobile distributed computing environment and has also brought several new challenges in distributed protocol design. In this paper, we study a very fundamental problem, the fault tol...
متن کاملA Review of Checkpointing Fault Tolerance Techniques in Distributed Mobile Systems
Fault Tolerance Techniques enable systems to perform tasks in the presence of faults. A checkpoint is a local state of a process saved on stable storage. In a distributed system, since the processes in the system do not share memory, a global state of the system is defined as a set of local states, one from each process. In case of a fault in distributed systems, checkpointing enables the execu...
متن کامل